We will use European Soccer Database from Kaggle cover 11 European Leagues from 2008:2016 sessions to analsis the following table displays the dataset in details
| Tables names | NO.of rows | NO.of columns |
|---|---|---|
| Country | 11 | 2 |
| League | 11 | 3 |
| Player | 11.1k | 7 |
| Player_Attributes | 184k | 42 |
| Match | 26,000 | 115 |
| Team | 299 | 5 |
| Team_Attributes | 1458 | 25 |
I'm used col player_name from the df "Player" to select mo salah id for merging with df Player_Attributes to explore his performance
I'm selecting the attributes of only Spanish teams by inner join between the Spanish part of Team_df and Team_Attributes_df
Spanish teams by inner join between the Spanish part of Team_df and match df
### Investigate Mohamed's salah performance and career the ups and down if exists from 2008:2016 as the most Egyptian professional player all over the Egyptian football history we will explore his attributes to find which one increase or decrease in what time to relate it with the teams he played for it which one enhanced his performance and which do opposite.
### what are the strongest teams in the Spanish league and the main characters' defense plying style or offensive style? so we will dive the attributes into related styles explore the most important Spanish teams attributes to figure out the top ten strong teams
### There is a reputation about the Spanish League that it lacks competition and that the championship is always between Barcelona and Real Madrid, was it the case in the eight years of research? we will explore the Spanish teams goals and total point per season we will rank them and check the claim accuracy
import numpy as np
import pandas as pd
import sqlite3
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
%matplotlib inline
pd.set_option('display.max_columns', 120)
pd.set_option('display.max_rows', 155)
pd.options.mode.chained_assignment = None
connection = sqlite3.connect("database.sqlite")
tables = pd.read_sql("""SELECT *
FROM sqlite_master
WHERE type='table'
""", connection)
tables.head(2)
| type | name | tbl_name | rootpage | sql | |
|---|---|---|---|---|---|
| 0 | table | sqlite_sequence | sqlite_sequence | 4 | CREATE TABLE sqlite_sequence(name,seq) |
| 1 | table | Player_Attributes | Player_Attributes | 11 | CREATE TABLE "Player_Attributes" (\n\t`id`\tIN... |
players = pd.read_sql(''' SELECT *
FROM Player
''', connection)
players.head(2)
| id | player_api_id | player_name | player_fifa_api_id | birthday | height | weight | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 505942 | Aaron Appindangoye | 218353 | 1992-02-29 00:00:00 | 182.88 | 187 |
| 1 | 2 | 155782 | Aaron Cresswell | 189615 | 1989-12-15 00:00:00 | 170.18 | 146 |
players.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 11060 entries, 0 to 11059 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 11060 non-null int64 1 player_api_id 11060 non-null int64 2 player_name 11060 non-null object 3 player_fifa_api_id 11060 non-null int64 4 birthday 11060 non-null object 5 height 11060 non-null float64 6 weight 11060 non-null int64 dtypes: float64(1), int64(4), object(2) memory usage: 605.0+ KB
players["height"].describe()
count 11060.000000 mean 181.867445 std 6.369201 min 157.480000 25% 177.800000 50% 182.880000 75% 185.420000 max 208.280000 Name: height, dtype: float64
ax=players.boxplot("height")
#ax.set_xlabel("height")
ax.set_ylabel("NO. of Plyers")
ax.set_title("height Box plot")
Text(0.5, 1.0, 'height Box plot')
perfect normal distrubioation
players["weight"].describe()
count 11060.000000 mean 168.380289 std 14.990217 min 117.000000 25% 159.000000 50% 168.000000 75% 179.000000 max 243.000000 Name: weight, dtype: float64
ax=players["weight"].hist(bins=10)
ax.set_xlabel("weight")
ax.set_ylabel("NO. of Plyers")
ax.set_title("weight distrubtion")
Text(0.5, 1.0, 'weight distrubtion')
T=players["height"].sort_values( ).max()
players.query(f"height =={T}")
| id | player_api_id | player_name | player_fifa_api_id | birthday | height | weight | |
|---|---|---|---|---|---|---|---|
| 5901 | 5908 | 148325 | Kristof van Hout | 185306 | 1987-02-09 00:00:00 | 208.28 | 243 |
H=players["weight"].sort_values( ).max()
players.query(f"weight =={H}")
| id | player_api_id | player_name | player_fifa_api_id | birthday | height | weight | |
|---|---|---|---|---|---|---|---|
| 5901 | 5908 | 148325 | Kristof van Hout | 185306 | 1987-02-09 00:00:00 | 208.28 | 243 |
| 10297 | 10313 | 27313 | Tim Wiese | 53012 | 1981-12-17 00:00:00 | 193.04 | 243 |
Player_Attributes = pd.read_sql(''' SELECT *
FROM Player_Attributes
''', connection)
Player_Attributes.head(2)
| id | player_fifa_api_id | player_api_id | date | overall_rating | potential | preferred_foot | attacking_work_rate | defensive_work_rate | crossing | finishing | heading_accuracy | short_passing | volleys | dribbling | curve | free_kick_accuracy | long_passing | ball_control | acceleration | sprint_speed | agility | reactions | balance | shot_power | jumping | stamina | strength | long_shots | aggression | interceptions | positioning | vision | penalties | marking | standing_tackle | sliding_tackle | gk_diving | gk_handling | gk_kicking | gk_positioning | gk_reflexes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 218353 | 505942 | 2016-02-18 00:00:00 | 67.0 | 71.0 | right | medium | medium | 49.0 | 44.0 | 71.0 | 61.0 | 44.0 | 51.0 | 45.0 | 39.0 | 64.0 | 49.0 | 60.0 | 64.0 | 59.0 | 47.0 | 65.0 | 55.0 | 58.0 | 54.0 | 76.0 | 35.0 | 71.0 | 70.0 | 45.0 | 54.0 | 48.0 | 65.0 | 69.0 | 69.0 | 6.0 | 11.0 | 10.0 | 8.0 | 8.0 |
| 1 | 2 | 218353 | 505942 | 2015-11-19 00:00:00 | 67.0 | 71.0 | right | medium | medium | 49.0 | 44.0 | 71.0 | 61.0 | 44.0 | 51.0 | 45.0 | 39.0 | 64.0 | 49.0 | 60.0 | 64.0 | 59.0 | 47.0 | 65.0 | 55.0 | 58.0 | 54.0 | 76.0 | 35.0 | 71.0 | 70.0 | 45.0 | 54.0 | 48.0 | 65.0 | 69.0 | 69.0 | 6.0 | 11.0 | 10.0 | 8.0 | 8.0 |
Player_Attributes.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 183978 entries, 0 to 183977 Data columns (total 42 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 183978 non-null int64 1 player_fifa_api_id 183978 non-null int64 2 player_api_id 183978 non-null int64 3 date 183978 non-null object 4 overall_rating 183142 non-null float64 5 potential 183142 non-null float64 6 preferred_foot 183142 non-null object 7 attacking_work_rate 180748 non-null object 8 defensive_work_rate 183142 non-null object 9 crossing 183142 non-null float64 10 finishing 183142 non-null float64 11 heading_accuracy 183142 non-null float64 12 short_passing 183142 non-null float64 13 volleys 181265 non-null float64 14 dribbling 183142 non-null float64 15 curve 181265 non-null float64 16 free_kick_accuracy 183142 non-null float64 17 long_passing 183142 non-null float64 18 ball_control 183142 non-null float64 19 acceleration 183142 non-null float64 20 sprint_speed 183142 non-null float64 21 agility 181265 non-null float64 22 reactions 183142 non-null float64 23 balance 181265 non-null float64 24 shot_power 183142 non-null float64 25 jumping 181265 non-null float64 26 stamina 183142 non-null float64 27 strength 183142 non-null float64 28 long_shots 183142 non-null float64 29 aggression 183142 non-null float64 30 interceptions 183142 non-null float64 31 positioning 183142 non-null float64 32 vision 181265 non-null float64 33 penalties 183142 non-null float64 34 marking 183142 non-null float64 35 standing_tackle 183142 non-null float64 36 sliding_tackle 181265 non-null float64 37 gk_diving 183142 non-null float64 38 gk_handling 183142 non-null float64 39 gk_kicking 183142 non-null float64 40 gk_positioning 183142 non-null float64 41 gk_reflexes 183142 non-null float64 dtypes: float64(35), int64(3), object(4) memory usage: 59.0+ MB
dis_patt = Player_Attributes.iloc[:,np.r_[4,5, 9 :42] ].describe()
dis_patt.iloc[np.r_[1,3,4,5,6,7] , :]
| overall_rating | potential | crossing | finishing | heading_accuracy | short_passing | volleys | dribbling | curve | free_kick_accuracy | long_passing | ball_control | acceleration | sprint_speed | agility | reactions | balance | shot_power | jumping | stamina | strength | long_shots | aggression | interceptions | positioning | vision | penalties | marking | standing_tackle | sliding_tackle | gk_diving | gk_handling | gk_kicking | gk_positioning | gk_reflexes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean | 68.600015 | 73.460353 | 55.086883 | 49.921078 | 57.266023 | 62.429672 | 49.468436 | 59.175154 | 52.965675 | 49.38095 | 57.06988 | 63.388879 | 67.659357 | 68.051244 | 65.97091 | 66.103706 | 65.189496 | 61.808427 | 66.969045 | 67.038544 | 67.424529 | 53.339431 | 60.948046 | 52.009271 | 55.786504 | 57.87355 | 55.003986 | 46.772242 | 50.351257 | 48.001462 | 14.704393 | 16.063612 | 20.998362 | 16.132154 | 16.441439 |
| min | 33.000000 | 39.000000 | 1.000000 | 1.000000 | 1.000000 | 3.000000 | 1.000000 | 1.000000 | 2.000000 | 1.00000 | 3.00000 | 5.000000 | 10.000000 | 12.000000 | 11.00000 | 17.000000 | 12.000000 | 2.000000 | 14.000000 | 10.000000 | 10.000000 | 1.000000 | 6.000000 | 1.000000 | 2.000000 | 1.00000 | 2.000000 | 1.000000 | 1.000000 | 2.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| 25% | 64.000000 | 69.000000 | 45.000000 | 34.000000 | 49.000000 | 57.000000 | 35.000000 | 52.000000 | 41.000000 | 36.00000 | 49.00000 | 58.000000 | 61.000000 | 62.000000 | 58.00000 | 61.000000 | 58.000000 | 54.000000 | 60.000000 | 61.000000 | 60.000000 | 41.000000 | 51.000000 | 34.000000 | 45.000000 | 49.00000 | 45.000000 | 25.000000 | 29.000000 | 25.000000 | 7.000000 | 8.000000 | 8.000000 | 8.000000 | 8.000000 |
| 50% | 69.000000 | 74.000000 | 59.000000 | 53.000000 | 60.000000 | 65.000000 | 52.000000 | 64.000000 | 56.000000 | 50.00000 | 59.00000 | 67.000000 | 69.000000 | 69.000000 | 68.00000 | 67.000000 | 67.000000 | 65.000000 | 68.000000 | 69.000000 | 69.000000 | 58.000000 | 64.000000 | 57.000000 | 60.000000 | 60.00000 | 57.000000 | 50.000000 | 56.000000 | 53.000000 | 10.000000 | 11.000000 | 12.000000 | 11.000000 | 11.000000 |
| 75% | 73.000000 | 78.000000 | 68.000000 | 65.000000 | 68.000000 | 72.000000 | 64.000000 | 72.000000 | 67.000000 | 63.00000 | 67.00000 | 73.000000 | 77.000000 | 77.000000 | 75.00000 | 72.000000 | 74.000000 | 73.000000 | 74.000000 | 76.000000 | 76.000000 | 67.000000 | 73.000000 | 68.000000 | 69.000000 | 69.00000 | 67.000000 | 66.000000 | 69.000000 | 67.000000 | 13.000000 | 15.000000 | 15.000000 | 15.000000 | 15.000000 |
| max | 94.000000 | 97.000000 | 95.000000 | 97.000000 | 98.000000 | 97.000000 | 93.000000 | 97.000000 | 94.000000 | 97.00000 | 97.00000 | 97.000000 | 97.000000 | 97.000000 | 96.00000 | 96.000000 | 96.000000 | 97.000000 | 96.000000 | 96.000000 | 96.000000 | 96.000000 | 97.000000 | 96.000000 | 96.000000 | 97.00000 | 96.000000 | 96.000000 | 95.000000 | 95.000000 | 94.000000 | 93.000000 | 97.000000 | 96.000000 | 96.000000 |
Team_df = pd.read_sql(''' SELECT *
FROM Team
''', connection)
Team_df.head(2)
| id | team_api_id | team_fifa_api_id | team_long_name | team_short_name | |
|---|---|---|---|---|---|
| 0 | 1 | 9987 | 673.0 | KRC Genk | GEN |
| 1 | 2 | 9993 | 675.0 | Beerschot AC | BAC |
Team_df.info() #team_fifa_api_id null
<class 'pandas.core.frame.DataFrame'> RangeIndex: 299 entries, 0 to 298 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 299 non-null int64 1 team_api_id 299 non-null int64 2 team_fifa_api_id 288 non-null float64 3 team_long_name 299 non-null object 4 team_short_name 299 non-null object dtypes: float64(1), int64(2), object(2) memory usage: 11.8+ KB
Team_Attributes_df = pd.read_sql(''' SELECT *
FROM Team_Attributes
''', connection)
Team_Attributes_df.head(2)
| id | team_fifa_api_id | team_api_id | date | buildUpPlaySpeed | buildUpPlaySpeedClass | buildUpPlayDribbling | buildUpPlayDribblingClass | buildUpPlayPassing | buildUpPlayPassingClass | buildUpPlayPositioningClass | chanceCreationPassing | chanceCreationPassingClass | chanceCreationCrossing | chanceCreationCrossingClass | chanceCreationShooting | chanceCreationShootingClass | chanceCreationPositioningClass | defencePressure | defencePressureClass | defenceAggression | defenceAggressionClass | defenceTeamWidth | defenceTeamWidthClass | defenceDefenderLineClass | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 434 | 9930 | 2010-02-22 00:00:00 | 60 | Balanced | NaN | Little | 50 | Mixed | Organised | 60 | Normal | 65 | Normal | 55 | Normal | Organised | 50 | Medium | 55 | Press | 45 | Normal | Cover |
| 1 | 2 | 434 | 9930 | 2014-09-19 00:00:00 | 52 | Balanced | 48.0 | Normal | 56 | Mixed | Organised | 54 | Normal | 63 | Normal | 64 | Normal | Organised | 47 | Medium | 44 | Press | 54 | Normal | Cover |
Team_Attributes_df.info() #buildUpPlayDribbling null
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1458 entries, 0 to 1457 Data columns (total 25 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 1458 non-null int64 1 team_fifa_api_id 1458 non-null int64 2 team_api_id 1458 non-null int64 3 date 1458 non-null object 4 buildUpPlaySpeed 1458 non-null int64 5 buildUpPlaySpeedClass 1458 non-null object 6 buildUpPlayDribbling 489 non-null float64 7 buildUpPlayDribblingClass 1458 non-null object 8 buildUpPlayPassing 1458 non-null int64 9 buildUpPlayPassingClass 1458 non-null object 10 buildUpPlayPositioningClass 1458 non-null object 11 chanceCreationPassing 1458 non-null int64 12 chanceCreationPassingClass 1458 non-null object 13 chanceCreationCrossing 1458 non-null int64 14 chanceCreationCrossingClass 1458 non-null object 15 chanceCreationShooting 1458 non-null int64 16 chanceCreationShootingClass 1458 non-null object 17 chanceCreationPositioningClass 1458 non-null object 18 defencePressure 1458 non-null int64 19 defencePressureClass 1458 non-null object 20 defenceAggression 1458 non-null int64 21 defenceAggressionClass 1458 non-null object 22 defenceTeamWidth 1458 non-null int64 23 defenceTeamWidthClass 1458 non-null object 24 defenceDefenderLineClass 1458 non-null object dtypes: float64(1), int64(11), object(13) memory usage: 284.9+ KB
Team_Attributes_df.iloc[: ,np.r_ [4,7,10,12,14,17,19,21]]
| buildUpPlaySpeed | buildUpPlayDribblingClass | buildUpPlayPositioningClass | chanceCreationPassingClass | chanceCreationCrossingClass | chanceCreationPositioningClass | defencePressureClass | defenceAggressionClass | |
|---|---|---|---|---|---|---|---|---|
| 0 | 60 | Little | Organised | Normal | Normal | Organised | Medium | Press |
| 1 | 52 | Normal | Organised | Normal | Normal | Organised | Medium | Press |
| 2 | 47 | Normal | Organised | Normal | Normal | Organised | Medium | Press |
| 3 | 70 | Little | Organised | Risky | Lots | Organised | Medium | Double |
| 4 | 47 | Little | Organised | Normal | Normal | Organised | Medium | Press |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1453 | 52 | Little | Organised | Normal | Normal | Organised | Medium | Press |
| 1454 | 54 | Little | Organised | Normal | Normal | Organised | Medium | Press |
| 1455 | 54 | Little | Organised | Normal | Normal | Organised | Medium | Press |
| 1456 | 54 | Normal | Organised | Normal | Normal | Organised | Medium | Press |
| 1457 | 54 | Normal | Organised | Normal | Normal | Organised | Medium | Press |
1458 rows × 8 columns
match_df = pd.read_sql(''' SELECT *
FROM Match
''', connection)
match_df.head(2)
| id | country_id | league_id | season | stage | date | match_api_id | home_team_api_id | away_team_api_id | home_team_goal | away_team_goal | home_player_X1 | home_player_X2 | home_player_X3 | home_player_X4 | home_player_X5 | home_player_X6 | home_player_X7 | home_player_X8 | home_player_X9 | home_player_X10 | home_player_X11 | away_player_X1 | away_player_X2 | away_player_X3 | away_player_X4 | away_player_X5 | away_player_X6 | away_player_X7 | away_player_X8 | away_player_X9 | away_player_X10 | away_player_X11 | home_player_Y1 | home_player_Y2 | home_player_Y3 | home_player_Y4 | home_player_Y5 | home_player_Y6 | home_player_Y7 | home_player_Y8 | home_player_Y9 | home_player_Y10 | home_player_Y11 | away_player_Y1 | away_player_Y2 | away_player_Y3 | away_player_Y4 | away_player_Y5 | away_player_Y6 | away_player_Y7 | away_player_Y8 | away_player_Y9 | away_player_Y10 | away_player_Y11 | home_player_1 | home_player_2 | home_player_3 | home_player_4 | home_player_5 | home_player_6 | home_player_7 | home_player_8 | home_player_9 | home_player_10 | home_player_11 | away_player_1 | away_player_2 | away_player_3 | away_player_4 | away_player_5 | away_player_6 | away_player_7 | away_player_8 | away_player_9 | away_player_10 | away_player_11 | goal | shoton | shotoff | foulcommit | card | cross | corner | possession | B365H | B365D | B365A | BWH | BWD | BWA | IWH | IWD | IWA | LBH | LBD | LBA | PSH | PSD | PSA | WHH | WHD | WHA | SJH | SJD | SJA | VCH | VCD | VCA | GBH | GBD | GBA | BSH | BSD | BSA | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 1 | 2008/2009 | 1 | 2008-08-17 00:00:00 | 492473 | 9987 | 9993 | 1 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None | None | None | None | None | None | None | None | 1.73 | 3.4 | 5.0 | 1.75 | 3.35 | 4.20 | 1.85 | 3.2 | 3.5 | 1.8 | 3.3 | 3.75 | NaN | NaN | NaN | 1.70 | 3.3 | 4.33 | 1.90 | 3.3 | 4.0 | 1.65 | 3.40 | 4.50 | 1.78 | 3.25 | 4.00 | 1.73 | 3.40 | 4.2 |
| 1 | 2 | 1 | 1 | 2008/2009 | 1 | 2008-08-16 00:00:00 | 492474 | 10000 | 9994 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None | None | None | None | None | None | None | None | 1.95 | 3.2 | 3.6 | 1.80 | 3.30 | 3.95 | 1.90 | 3.2 | 3.5 | 1.9 | 3.2 | 3.50 | NaN | NaN | NaN | 1.83 | 3.3 | 3.60 | 1.95 | 3.3 | 3.8 | 2.00 | 3.25 | 3.25 | 1.85 | 3.25 | 3.75 | 1.91 | 3.25 | 3.6 |
match_df.info(verbose=True, show_counts=True ) # too meny null bute we will not use them
#match_df.iloc[:,np.r_[5,3, 9,10 ] ]
<class 'pandas.core.frame.DataFrame'> RangeIndex: 25979 entries, 0 to 25978 Data columns (total 115 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 25979 non-null int64 1 country_id 25979 non-null int64 2 league_id 25979 non-null int64 3 season 25979 non-null object 4 stage 25979 non-null int64 5 date 25979 non-null object 6 match_api_id 25979 non-null int64 7 home_team_api_id 25979 non-null int64 8 away_team_api_id 25979 non-null int64 9 home_team_goal 25979 non-null int64 10 away_team_goal 25979 non-null int64 11 home_player_X1 24158 non-null float64 12 home_player_X2 24158 non-null float64 13 home_player_X3 24147 non-null float64 14 home_player_X4 24147 non-null float64 15 home_player_X5 24147 non-null float64 16 home_player_X6 24147 non-null float64 17 home_player_X7 24147 non-null float64 18 home_player_X8 24147 non-null float64 19 home_player_X9 24147 non-null float64 20 home_player_X10 24147 non-null float64 21 home_player_X11 24147 non-null float64 22 away_player_X1 24147 non-null float64 23 away_player_X2 24147 non-null float64 24 away_player_X3 24147 non-null float64 25 away_player_X4 24147 non-null float64 26 away_player_X5 24147 non-null float64 27 away_player_X6 24147 non-null float64 28 away_player_X7 24147 non-null float64 29 away_player_X8 24147 non-null float64 30 away_player_X9 24146 non-null float64 31 away_player_X10 24146 non-null float64 32 away_player_X11 24140 non-null float64 33 home_player_Y1 24158 non-null float64 34 home_player_Y2 24158 non-null float64 35 home_player_Y3 24147 non-null float64 36 home_player_Y4 24147 non-null float64 37 home_player_Y5 24147 non-null float64 38 home_player_Y6 24147 non-null float64 39 home_player_Y7 24147 non-null float64 40 home_player_Y8 24147 non-null float64 41 home_player_Y9 24147 non-null float64 42 home_player_Y10 24147 non-null float64 43 home_player_Y11 24147 non-null float64 44 away_player_Y1 24147 non-null float64 45 away_player_Y2 24147 non-null float64 46 away_player_Y3 24147 non-null float64 47 away_player_Y4 24147 non-null float64 48 away_player_Y5 24147 non-null float64 49 away_player_Y6 24147 non-null float64 50 away_player_Y7 24147 non-null float64 51 away_player_Y8 24147 non-null float64 52 away_player_Y9 24146 non-null float64 53 away_player_Y10 24146 non-null float64 54 away_player_Y11 24140 non-null float64 55 home_player_1 24755 non-null float64 56 home_player_2 24664 non-null float64 57 home_player_3 24698 non-null float64 58 home_player_4 24656 non-null float64 59 home_player_5 24663 non-null float64 60 home_player_6 24654 non-null float64 61 home_player_7 24752 non-null float64 62 home_player_8 24670 non-null float64 63 home_player_9 24706 non-null float64 64 home_player_10 24543 non-null float64 65 home_player_11 24424 non-null float64 66 away_player_1 24745 non-null float64 67 away_player_2 24701 non-null float64 68 away_player_3 24686 non-null float64 69 away_player_4 24658 non-null float64 70 away_player_5 24644 non-null float64 71 away_player_6 24666 non-null float64 72 away_player_7 24744 non-null float64 73 away_player_8 24638 non-null float64 74 away_player_9 24651 non-null float64 75 away_player_10 24538 non-null float64 76 away_player_11 24425 non-null float64 77 goal 14217 non-null object 78 shoton 14217 non-null object 79 shotoff 14217 non-null object 80 foulcommit 14217 non-null object 81 card 14217 non-null object 82 cross 14217 non-null object 83 corner 14217 non-null object 84 possession 14217 non-null object 85 B365H 22592 non-null float64 86 B365D 22592 non-null float64 87 B365A 22592 non-null float64 88 BWH 22575 non-null float64 89 BWD 22575 non-null float64 90 BWA 22575 non-null float64 91 IWH 22520 non-null float64 92 IWD 22520 non-null float64 93 IWA 22520 non-null float64 94 LBH 22556 non-null float64 95 LBD 22556 non-null float64 96 LBA 22556 non-null float64 97 PSH 11168 non-null float64 98 PSD 11168 non-null float64 99 PSA 11168 non-null float64 100 WHH 22571 non-null float64 101 WHD 22571 non-null float64 102 WHA 22571 non-null float64 103 SJH 17097 non-null float64 104 SJD 17097 non-null float64 105 SJA 17097 non-null float64 106 VCH 22568 non-null float64 107 VCD 22568 non-null float64 108 VCA 22568 non-null float64 109 GBH 14162 non-null float64 110 GBD 14162 non-null float64 111 GBA 14162 non-null float64 112 BSH 14161 non-null float64 113 BSD 14161 non-null float64 114 BSA 14161 non-null float64 dtypes: float64(96), int64(9), object(10) memory usage: 22.8+ MB
match_df.iloc[:,np.r_[5,3, 9,10 ] ].describe()
| home_team_goal | away_team_goal | |
|---|---|---|
| count | 25979.000000 | 25979.000000 |
| mean | 1.544594 | 1.160938 |
| std | 1.297158 | 1.142110 |
| min | 0.000000 | 0.000000 |
| 25% | 1.000000 | 0.000000 |
| 50% | 1.000000 | 1.000000 |
| 75% | 2.000000 | 2.000000 |
| max | 10.000000 | 9.000000 |
match_df["home_team_goal"].hist(bins=9);
# ax=players["weight"].hist()
# ax.set_xlabel("weight")
# ax.set_ylabel("NO. of Plyers")
# ax.set_title("weight distrubtion")
match_df["away_team_goal"].hist(bins=9);
#define the function
def drop_forty_null_column(df):
""" drop any columns contain 40% or more of null values
Args: df - pandas DataFrame
return the modified df - pandas DataFrame
"""
col_list = df.columns
while True:
for col in col_list :
if (df[col].isnull().sum()/len(df)) >= 0.4 :
df.drop(columns=[col],inplace =True )
else :
continue
return df
Team_Attributes_df.shape
(1458, 25)
drop_forty_null_column(Team_Attributes_df)
Team_Attributes_df.shape # drop buildUpPlayDribbling col
(1458, 24)
match_df.shape
(25979, 115)
drop_forty_null_column(match_df)
match_df.shape # alots of cols in this DF have 40% or more of null values
(25979, 98)
Player_Att_numric=Player_Attributes.iloc[:,np.r_[2,3,4,5, 9 :42] ]
merged_plyerName_plyerAtt = players.merge(Player_Att_numric, on="player_api_id",suffixes =("_name","_att"))
let's chek mo' salah df for nulls values
mo_salah_df=merged_plyerName_plyerAtt[merged_plyerName_plyerAtt["player_name"]== "Mohamed Salah"]
# lets chik if any null values !
mo_salah_df.isnull().values.any()
mo_salah_df.duplicated().sum()
0
mo_salah_df.duplicated().sum()
0
# let's have adeepr look of mo's atributes
mo_salah_explor_df =mo_salah_df.iloc[: , np.r_ [8:40 ]]
plt.figure(figsize=(12,15), dpi= 180)
mo_mean=mo_salah_df.iloc[:,np.r_[ 9 :33] ].mean()
total_mean =Player_Attributes.iloc[:,np.r_[ 9 :34] ].mean()
fig, ax = plt.subplots(figsize =(8,6))
ax.hist(total_mean, alpha=0.5, label='genrral mean', bins =7)
ax.hist(mo_mean, alpha=0.5, label='mo_salah mean',bins =7)
ax.set_title("Mo Salah attributes average vs total players attributes average")
ax.set_xlabel('Performance')
ax.set_ylabel('Attributes No.')
ax.legend(loc='upper right')
ax.legend(bbox_to_anchor=(1,1))
plt.show();
<Figure size 2160x2700 with 0 Axes>
# let's have adeepr look of mo's atributes
mo_salah_explor_df =mo_salah_df.iloc[: , np.r_ [7:33 ]]
mo_salah_explor_df
| date | overall_rating | potential | crossing | finishing | heading_accuracy | short_passing | volleys | dribbling | curve | free_kick_accuracy | long_passing | ball_control | acceleration | sprint_speed | agility | reactions | balance | shot_power | jumping | stamina | strength | long_shots | aggression | interceptions | positioning | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 128023 | 2016-05-05 00:00:00 | 80.0 | 85.0 | 68.0 | 77.0 | 59.0 | 75.0 | 69.0 | 86.0 | 59.0 | 60.0 | 63.0 | 86.0 | 92.0 | 91.0 | 86.0 | 80.0 | 84.0 | 71.0 | 70.0 | 77.0 | 60.0 | 61.0 | 63.0 | 55.0 | 77.0 |
| 128024 | 2016-04-21 00:00:00 | 80.0 | 85.0 | 68.0 | 77.0 | 59.0 | 75.0 | 69.0 | 86.0 | 59.0 | 60.0 | 63.0 | 86.0 | 92.0 | 91.0 | 86.0 | 75.0 | 84.0 | 75.0 | 70.0 | 77.0 | 60.0 | 61.0 | 63.0 | 55.0 | 77.0 |
| 128025 | 2015-12-17 00:00:00 | 80.0 | 85.0 | 68.0 | 77.0 | 59.0 | 75.0 | 69.0 | 86.0 | 64.0 | 68.0 | 67.0 | 86.0 | 92.0 | 91.0 | 86.0 | 75.0 | 84.0 | 75.0 | 70.0 | 77.0 | 60.0 | 61.0 | 63.0 | 55.0 | 77.0 |
| 128026 | 2015-11-12 00:00:00 | 80.0 | 85.0 | 68.0 | 77.0 | 59.0 | 75.0 | 69.0 | 86.0 | 64.0 | 68.0 | 67.0 | 86.0 | 92.0 | 91.0 | 86.0 | 75.0 | 84.0 | 75.0 | 70.0 | 77.0 | 60.0 | 61.0 | 63.0 | 55.0 | 77.0 |
| 128027 | 2015-10-09 00:00:00 | 80.0 | 87.0 | 68.0 | 77.0 | 59.0 | 75.0 | 69.0 | 86.0 | 64.0 | 68.0 | 67.0 | 86.0 | 92.0 | 91.0 | 86.0 | 75.0 | 84.0 | 75.0 | 70.0 | 77.0 | 60.0 | 61.0 | 63.0 | 55.0 | 77.0 |
| 128028 | 2015-09-21 00:00:00 | 80.0 | 87.0 | 68.0 | 77.0 | 59.0 | 75.0 | 69.0 | 86.0 | 64.0 | 68.0 | 67.0 | 86.0 | 92.0 | 91.0 | 86.0 | 75.0 | 84.0 | 75.0 | 70.0 | 77.0 | 60.0 | 61.0 | 63.0 | 55.0 | 77.0 |
| 128029 | 2015-07-03 00:00:00 | 77.0 | 84.0 | 67.0 | 76.0 | 58.0 | 74.0 | 68.0 | 85.0 | 63.0 | 67.0 | 66.0 | 85.0 | 92.0 | 94.0 | 86.0 | 74.0 | 84.0 | 74.0 | 70.0 | 77.0 | 60.0 | 60.0 | 62.0 | 54.0 | 76.0 |
| 128030 | 2015-04-24 00:00:00 | 77.0 | 84.0 | 67.0 | 76.0 | 58.0 | 74.0 | 68.0 | 85.0 | 63.0 | 67.0 | 66.0 | 85.0 | 92.0 | 94.0 | 86.0 | 74.0 | 84.0 | 74.0 | 70.0 | 77.0 | 60.0 | 60.0 | 62.0 | 54.0 | 76.0 |
| 128031 | 2015-03-20 00:00:00 | 76.0 | 84.0 | 67.0 | 74.0 | 58.0 | 74.0 | 68.0 | 85.0 | 63.0 | 67.0 | 66.0 | 81.0 | 92.0 | 94.0 | 86.0 | 70.0 | 84.0 | 74.0 | 70.0 | 77.0 | 60.0 | 50.0 | 62.0 | 58.0 | 74.0 |
| 128032 | 2015-03-13 00:00:00 | 76.0 | 84.0 | 67.0 | 74.0 | 58.0 | 74.0 | 68.0 | 85.0 | 63.0 | 67.0 | 66.0 | 81.0 | 92.0 | 94.0 | 86.0 | 70.0 | 84.0 | 74.0 | 70.0 | 77.0 | 60.0 | 50.0 | 62.0 | 58.0 | 74.0 |
| 128033 | 2015-03-06 00:00:00 | 76.0 | 84.0 | 67.0 | 68.0 | 58.0 | 74.0 | 60.0 | 85.0 | 63.0 | 67.0 | 66.0 | 81.0 | 92.0 | 94.0 | 86.0 | 70.0 | 80.0 | 74.0 | 70.0 | 77.0 | 59.0 | 50.0 | 62.0 | 58.0 | 74.0 |
| 128034 | 2015-02-06 00:00:00 | 76.0 | 84.0 | 67.0 | 68.0 | 58.0 | 74.0 | 48.0 | 85.0 | 63.0 | 67.0 | 66.0 | 81.0 | 92.0 | 94.0 | 86.0 | 69.0 | 80.0 | 74.0 | 70.0 | 77.0 | 59.0 | 50.0 | 61.0 | 59.0 | 73.0 |
| 128035 | 2014-09-18 00:00:00 | 76.0 | 84.0 | 67.0 | 68.0 | 58.0 | 74.0 | 48.0 | 85.0 | 63.0 | 67.0 | 66.0 | 81.0 | 92.0 | 94.0 | 86.0 | 69.0 | 80.0 | 74.0 | 70.0 | 77.0 | 59.0 | 50.0 | 61.0 | 59.0 | 73.0 |
| 128036 | 2014-05-16 00:00:00 | 77.0 | 87.0 | 65.0 | 68.0 | 58.0 | 74.0 | 48.0 | 86.0 | 63.0 | 67.0 | 66.0 | 89.0 | 92.0 | 94.0 | 86.0 | 68.0 | 80.0 | 74.0 | 70.0 | 77.0 | 58.0 | 50.0 | 61.0 | 59.0 | 73.0 |
| 128037 | 2014-01-31 00:00:00 | 77.0 | 87.0 | 65.0 | 68.0 | 58.0 | 74.0 | 48.0 | 86.0 | 63.0 | 67.0 | 66.0 | 89.0 | 92.0 | 94.0 | 86.0 | 68.0 | 80.0 | 74.0 | 70.0 | 77.0 | 58.0 | 50.0 | 61.0 | 59.0 | 73.0 |
| 128038 | 2014-01-10 00:00:00 | 75.0 | 86.0 | 61.0 | 68.0 | 58.0 | 74.0 | 48.0 | 82.0 | 63.0 | 67.0 | 61.0 | 87.0 | 92.0 | 93.0 | 78.0 | 68.0 | 71.0 | 68.0 | 70.0 | 77.0 | 58.0 | 50.0 | 61.0 | 59.0 | 73.0 |
| 128039 | 2013-12-13 00:00:00 | 75.0 | 85.0 | 61.0 | 68.0 | 58.0 | 74.0 | 48.0 | 82.0 | 63.0 | 67.0 | 61.0 | 87.0 | 92.0 | 91.0 | 78.0 | 68.0 | 71.0 | 68.0 | 70.0 | 77.0 | 58.0 | 50.0 | 61.0 | 59.0 | 73.0 |
| 128040 | 2013-11-01 00:00:00 | 75.0 | 85.0 | 61.0 | 68.0 | 48.0 | 74.0 | 48.0 | 82.0 | 63.0 | 67.0 | 61.0 | 87.0 | 92.0 | 91.0 | 78.0 | 68.0 | 71.0 | 68.0 | 70.0 | 77.0 | 58.0 | 50.0 | 61.0 | 59.0 | 73.0 |
| 128041 | 2013-10-04 00:00:00 | 75.0 | 85.0 | 61.0 | 68.0 | 48.0 | 74.0 | 48.0 | 82.0 | 63.0 | 67.0 | 61.0 | 87.0 | 92.0 | 91.0 | 73.0 | 68.0 | 60.0 | 68.0 | 70.0 | 77.0 | 56.0 | 50.0 | 61.0 | 59.0 | 73.0 |
| 128042 | 2013-09-20 00:00:00 | 74.0 | 85.0 | 61.0 | 67.0 | 48.0 | 74.0 | 48.0 | 82.0 | 63.0 | 67.0 | 60.0 | 86.0 | 92.0 | 91.0 | 73.0 | 68.0 | 54.0 | 68.0 | 70.0 | 77.0 | 56.0 | 50.0 | 61.0 | 59.0 | 73.0 |
| 128043 | 2013-03-22 00:00:00 | 73.0 | 82.0 | 61.0 | 66.0 | 48.0 | 74.0 | 46.0 | 82.0 | 56.0 | 59.0 | 60.0 | 79.0 | 89.0 | 85.0 | 73.0 | 68.0 | 54.0 | 66.0 | 70.0 | 72.0 | 56.0 | 51.0 | 61.0 | 59.0 | 71.0 |
| 128044 | 2013-03-08 00:00:00 | 73.0 | 82.0 | 61.0 | 66.0 | 48.0 | 74.0 | 46.0 | 82.0 | 56.0 | 59.0 | 60.0 | 79.0 | 89.0 | 85.0 | 73.0 | 68.0 | 54.0 | 66.0 | 70.0 | 72.0 | 56.0 | 51.0 | 61.0 | 59.0 | 71.0 |
| 128045 | 2013-02-15 00:00:00 | 70.0 | 82.0 | 61.0 | 66.0 | 48.0 | 74.0 | 46.0 | 79.0 | 56.0 | 59.0 | 60.0 | 68.0 | 75.0 | 81.0 | 73.0 | 68.0 | 54.0 | 66.0 | 70.0 | 72.0 | 56.0 | 51.0 | 61.0 | 59.0 | 71.0 |
| 128046 | 2012-08-31 00:00:00 | 70.0 | 82.0 | 61.0 | 66.0 | 48.0 | 74.0 | 46.0 | 79.0 | 56.0 | 59.0 | 60.0 | 68.0 | 75.0 | 81.0 | 73.0 | 68.0 | 54.0 | 66.0 | 70.0 | 72.0 | 56.0 | 51.0 | 61.0 | 59.0 | 73.0 |
| 128047 | 2007-02-22 00:00:00 | 70.0 | 82.0 | 61.0 | 66.0 | 48.0 | 74.0 | 46.0 | 79.0 | 56.0 | 59.0 | 60.0 | 68.0 | 75.0 | 81.0 | 73.0 | 68.0 | 54.0 | 66.0 | 70.0 | 72.0 | 56.0 | 51.0 | 61.0 | 59.0 | 73.0 |
mo_salah_explor_df.describe()
| overall_rating | potential | crossing | finishing | heading_accuracy | short_passing | volleys | dribbling | curve | free_kick_accuracy | long_passing | ball_control | acceleration | sprint_speed | agility | reactions | balance | shot_power | jumping | stamina | strength | long_shots | aggression | interceptions | positioning | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 25.000000 | 25.000000 | 25.000000 | 25.000000 | 25.000000 | 25.00000 | 25.000000 | 25.000000 | 25.000000 | 25.000000 | 25.000000 | 25.00000 | 25.000000 | 25.000000 | 25.000000 | 25.000000 | 25.000000 | 25.000000 | 25.0 | 25.000000 | 25.000000 | 25.000 | 25.000000 | 25.000000 | 25.000000 |
| mean | 75.920000 | 84.480000 | 64.680000 | 70.840000 | 55.040000 | 74.24000 | 56.320000 | 83.760000 | 61.440000 | 65.000000 | 63.680000 | 82.60000 | 89.720000 | 90.480000 | 81.400000 | 70.680000 | 73.480000 | 71.280000 | 70.0 | 76.000000 | 58.360000 | 53.640 | 61.680000 | 57.520000 | 74.080000 |
| std | 3.121431 | 1.610383 | 3.158586 | 4.588754 | 4.945368 | 0.43589 | 10.566141 | 2.420055 | 3.028751 | 3.662877 | 2.911472 | 6.22495 | 5.608921 | 4.331282 | 5.937171 | 3.508561 | 12.576831 | 3.702702 | 0.0 | 2.041241 | 1.680278 | 4.999 | 0.852447 | 1.981582 | 2.039608 |
| min | 70.000000 | 82.000000 | 61.000000 | 66.000000 | 48.000000 | 74.00000 | 46.000000 | 79.000000 | 56.000000 | 59.000000 | 60.000000 | 68.00000 | 75.000000 | 81.000000 | 73.000000 | 68.000000 | 54.000000 | 66.000000 | 70.0 | 72.000000 | 56.000000 | 50.000 | 61.000000 | 54.000000 | 71.000000 |
| 25% | 75.000000 | 84.000000 | 61.000000 | 68.000000 | 48.000000 | 74.00000 | 48.000000 | 82.000000 | 59.000000 | 60.000000 | 61.000000 | 81.00000 | 92.000000 | 91.000000 | 73.000000 | 68.000000 | 60.000000 | 68.000000 | 70.0 | 77.000000 | 56.000000 | 50.000 | 61.000000 | 55.000000 | 73.000000 |
| 50% | 76.000000 | 85.000000 | 67.000000 | 68.000000 | 58.000000 | 74.00000 | 48.000000 | 85.000000 | 63.000000 | 67.000000 | 66.000000 | 86.00000 | 92.000000 | 91.000000 | 86.000000 | 69.000000 | 80.000000 | 74.000000 | 70.0 | 77.000000 | 59.000000 | 51.000 | 61.000000 | 59.000000 | 73.000000 |
| 75% | 77.000000 | 85.000000 | 67.000000 | 76.000000 | 58.000000 | 74.00000 | 68.000000 | 86.000000 | 63.000000 | 67.000000 | 66.000000 | 86.00000 | 92.000000 | 94.000000 | 86.000000 | 74.000000 | 84.000000 | 74.000000 | 70.0 | 77.000000 | 60.000000 | 60.000 | 62.000000 | 59.000000 | 76.000000 |
| max | 80.000000 | 87.000000 | 68.000000 | 77.000000 | 59.000000 | 75.00000 | 69.000000 | 86.000000 | 64.000000 | 68.000000 | 67.000000 | 89.00000 | 92.000000 | 94.000000 | 86.000000 | 80.000000 | 84.000000 | 75.000000 | 70.0 | 77.000000 | 60.000000 | 61.000 | 63.000000 | 59.000000 | 77.000000 |
mo_df =mo_salah_df.iloc[: , np.r_[7,8,11,12,14,19,20,21,22,23,24]]
mo_df.head(34)
| date | overall_rating | finishing | heading_accuracy | volleys | ball_control | acceleration | sprint_speed | agility | reactions | balance | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 128023 | 2016-05-05 00:00:00 | 80.0 | 77.0 | 59.0 | 69.0 | 86.0 | 92.0 | 91.0 | 86.0 | 80.0 | 84.0 |
| 128024 | 2016-04-21 00:00:00 | 80.0 | 77.0 | 59.0 | 69.0 | 86.0 | 92.0 | 91.0 | 86.0 | 75.0 | 84.0 |
| 128025 | 2015-12-17 00:00:00 | 80.0 | 77.0 | 59.0 | 69.0 | 86.0 | 92.0 | 91.0 | 86.0 | 75.0 | 84.0 |
| 128026 | 2015-11-12 00:00:00 | 80.0 | 77.0 | 59.0 | 69.0 | 86.0 | 92.0 | 91.0 | 86.0 | 75.0 | 84.0 |
| 128027 | 2015-10-09 00:00:00 | 80.0 | 77.0 | 59.0 | 69.0 | 86.0 | 92.0 | 91.0 | 86.0 | 75.0 | 84.0 |
| 128028 | 2015-09-21 00:00:00 | 80.0 | 77.0 | 59.0 | 69.0 | 86.0 | 92.0 | 91.0 | 86.0 | 75.0 | 84.0 |
| 128029 | 2015-07-03 00:00:00 | 77.0 | 76.0 | 58.0 | 68.0 | 85.0 | 92.0 | 94.0 | 86.0 | 74.0 | 84.0 |
| 128030 | 2015-04-24 00:00:00 | 77.0 | 76.0 | 58.0 | 68.0 | 85.0 | 92.0 | 94.0 | 86.0 | 74.0 | 84.0 |
| 128031 | 2015-03-20 00:00:00 | 76.0 | 74.0 | 58.0 | 68.0 | 81.0 | 92.0 | 94.0 | 86.0 | 70.0 | 84.0 |
| 128032 | 2015-03-13 00:00:00 | 76.0 | 74.0 | 58.0 | 68.0 | 81.0 | 92.0 | 94.0 | 86.0 | 70.0 | 84.0 |
| 128033 | 2015-03-06 00:00:00 | 76.0 | 68.0 | 58.0 | 60.0 | 81.0 | 92.0 | 94.0 | 86.0 | 70.0 | 80.0 |
| 128034 | 2015-02-06 00:00:00 | 76.0 | 68.0 | 58.0 | 48.0 | 81.0 | 92.0 | 94.0 | 86.0 | 69.0 | 80.0 |
| 128035 | 2014-09-18 00:00:00 | 76.0 | 68.0 | 58.0 | 48.0 | 81.0 | 92.0 | 94.0 | 86.0 | 69.0 | 80.0 |
| 128036 | 2014-05-16 00:00:00 | 77.0 | 68.0 | 58.0 | 48.0 | 89.0 | 92.0 | 94.0 | 86.0 | 68.0 | 80.0 |
| 128037 | 2014-01-31 00:00:00 | 77.0 | 68.0 | 58.0 | 48.0 | 89.0 | 92.0 | 94.0 | 86.0 | 68.0 | 80.0 |
| 128038 | 2014-01-10 00:00:00 | 75.0 | 68.0 | 58.0 | 48.0 | 87.0 | 92.0 | 93.0 | 78.0 | 68.0 | 71.0 |
| 128039 | 2013-12-13 00:00:00 | 75.0 | 68.0 | 58.0 | 48.0 | 87.0 | 92.0 | 91.0 | 78.0 | 68.0 | 71.0 |
| 128040 | 2013-11-01 00:00:00 | 75.0 | 68.0 | 48.0 | 48.0 | 87.0 | 92.0 | 91.0 | 78.0 | 68.0 | 71.0 |
| 128041 | 2013-10-04 00:00:00 | 75.0 | 68.0 | 48.0 | 48.0 | 87.0 | 92.0 | 91.0 | 73.0 | 68.0 | 60.0 |
| 128042 | 2013-09-20 00:00:00 | 74.0 | 67.0 | 48.0 | 48.0 | 86.0 | 92.0 | 91.0 | 73.0 | 68.0 | 54.0 |
| 128043 | 2013-03-22 00:00:00 | 73.0 | 66.0 | 48.0 | 46.0 | 79.0 | 89.0 | 85.0 | 73.0 | 68.0 | 54.0 |
| 128044 | 2013-03-08 00:00:00 | 73.0 | 66.0 | 48.0 | 46.0 | 79.0 | 89.0 | 85.0 | 73.0 | 68.0 | 54.0 |
| 128045 | 2013-02-15 00:00:00 | 70.0 | 66.0 | 48.0 | 46.0 | 68.0 | 75.0 | 81.0 | 73.0 | 68.0 | 54.0 |
| 128046 | 2012-08-31 00:00:00 | 70.0 | 66.0 | 48.0 | 46.0 | 68.0 | 75.0 | 81.0 | 73.0 | 68.0 | 54.0 |
| 128047 | 2007-02-22 00:00:00 | 70.0 | 66.0 | 48.0 | 46.0 | 68.0 | 75.0 | 81.0 | 73.0 | 68.0 | 54.0 |
mo_df.describe()
| overall_rating | finishing | heading_accuracy | volleys | ball_control | acceleration | sprint_speed | agility | reactions | balance | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 25.000000 | 25.000000 | 25.000000 | 25.000000 | 25.00000 | 25.000000 | 25.000000 | 25.000000 | 25.000000 | 25.000000 |
| mean | 75.920000 | 70.840000 | 55.040000 | 56.320000 | 82.60000 | 89.720000 | 90.480000 | 81.400000 | 70.680000 | 73.480000 |
| std | 3.121431 | 4.588754 | 4.945368 | 10.566141 | 6.22495 | 5.608921 | 4.331282 | 5.937171 | 3.508561 | 12.576831 |
| min | 70.000000 | 66.000000 | 48.000000 | 46.000000 | 68.00000 | 75.000000 | 81.000000 | 73.000000 | 68.000000 | 54.000000 |
| 25% | 75.000000 | 68.000000 | 48.000000 | 48.000000 | 81.00000 | 92.000000 | 91.000000 | 73.000000 | 68.000000 | 60.000000 |
| 50% | 76.000000 | 68.000000 | 58.000000 | 48.000000 | 86.00000 | 92.000000 | 91.000000 | 86.000000 | 69.000000 | 80.000000 |
| 75% | 77.000000 | 76.000000 | 58.000000 | 68.000000 | 86.00000 | 92.000000 | 94.000000 | 86.000000 | 74.000000 | 84.000000 |
| max | 80.000000 | 77.000000 | 59.000000 | 69.000000 | 89.00000 | 92.000000 | 94.000000 | 86.000000 | 80.000000 | 84.000000 |
plt.figure(figsize=(6,6), dpi= 180)
fig, ax = plt.subplots();
mo_df.overall_rating.hist(bins=4);
ax.set_title("Mo Salah overall_rating")
ax.set_ylabel('frequency')
ax.set_xlabel('overall rating grade');
<Figure size 1080x1080 with 0 Axes>
plt.figure(figsize=(6,6), dpi= 180)
fig, ax = plt.subplots();
mo_df.sprint_speed.hist(bins=3);
ax.set_title("Mo Salah sprint_speed")
ax.set_ylabel('frequency')
ax.set_xlabel('sprint speed grade');
<Figure size 1080x1080 with 0 Axes>
plt.figure(figsize=(6,6), dpi= 180)
fig, ax = plt.subplots();
mo_df.balance.hist(bins=3);
ax.set_title("Mo Salah balance")
ax.set_ylabel('frequency')
ax.set_xlabel('balance grade');
<Figure size 1080x1080 with 0 Axes>
mo_df["date"]=pd.to_datetime(mo_df["date"])
mo_df =mo_df.sort_values(by=['date'], ascending=True)
plt.rcParams.update({'font.size': 19})
cols_names =mo_df.columns[1:]
x = np.arange(len(cols_names))
bar_size = 0.45
#plot the chart
fig, ax = plt.subplots(figsize =(15,12), dpi= 180)
ax.barh(x - bar_size/2,mo_df.min().tolist()[1:], bar_size, label= " Mo Salah minmum " )
ax.barh(x + bar_size/2, mo_df.max().tolist()[1:], bar_size, label='" Mo Salah Maxmum "')
# formating the chart.
ax.set_title("Mo Salah max vs Mo Salah min")
ax.set_ylabel('Attributes')
ax.set_xlabel('GREAD of performanc.' )
ax.legend(loc='upper right')
plt.yticks(x ,cols_names)
ax.legend(bbox_to_anchor=(1,1))
ax.set_label(cols_names)
plt.show()
sns.set_style("darkgrid")
plt.figure(figsize=(20,20), dpi= 180);
fig = px.histogram(data_frame=mo_df , x ="date" ,y = "overall_rating" ,nbins=32
, hover_data =mo_df ,title = "Mohammaed's Salah cumulative overall rating",
cumulative =True ,text_auto =True ,marginal ='box' );
fig.show();
<Figure size 3600x3600 with 0 Axes>
sns.set_style("darkgrid")
plt.figure(figsize=(20,20), dpi= 180);
fig = px.histogram(data_frame=mo_df , x ="date" ,y = "overall_rating" ,nbins=16
, hover_data =mo_df ,title = "Mohammaed's Salah cumulative overall rating",
cumulative =False ,text_auto =True ,marginal ='box' );
fig.show();
<Figure size 3600x3600 with 0 Axes>
fig = px.line(mo_df, x='date', y='overall_rating' )
fig.add_scatter(x=mo_df['date'], y=mo_df['reactions'] ,name ='reactions', mode='lines',line=dict(color="gray"))
fig.add_scatter(x=mo_df['date'], y=mo_df['volleys'],name ="volleys", mode='lines',line=dict(color="black"))
fig.add_scatter(x=mo_df['date'], y=mo_df['balance'],name ="balance", mode='lines')
fig.add_scatter(x=mo_df['date'], y=mo_df['finishing'],name ="finishing", mode='lines')
fig.add_scatter(x=mo_df['date'], y=mo_df["sprint_speed"],name ="sprint_speed", mode='lines',line=dict(color="indigo"))
fig.add_scatter(x=mo_df['date'], y=mo_df['heading_accuracy'],name ="heading accuracy", mode='lines+markers')
fig.add_scatter(x=mo_df['date'], y=mo_df['ball_control'],name ="ball control", mode='lines',line=dict(color="red"))
fig.add_scatter(x=mo_df['date'], y=mo_df['acceleration'],name ="acceleration", mode='lines')
fig.add_hline(y=1, annotation_position="bottom right")
fig.update_layout(legend=dict(orientation="v",yanchor="middle", y=.5,xanchor="left",x=1) , title='<b>Mohamed\'s Salah performance during the period from 2008 to 2016</b>', xaxis_title='<b>DATE</b>', yaxis_title='<b>The scale of performance</b>')
fig.update_layout(yaxis_range=[40,100])
# Add images
fig.update_layout(images=[dict(source="images.jpg",xref="paper", yref="paper",x=0.2, y=.7,sizex=.37, sizey=.8,xanchor="left", yanchor="bottom",
sizing= "contain",opacity= 0.7,visible = True,layer= "below")],hovermode="x unified",autosize=True,
font=dict(family="Helvetica, Bold", size=14,color="Black"))
# Set templates
fig.update_layout(template = "plotly_white")
fig.add_vrect(x0="2013-04-01", x1="2015-4-30",annotation_text="<b>the amazing explosion of performance</b>", fillcolor="red", opacity=0.2, line_width=1)
ax.legend(bbox_to_anchor=(3,3))
fig.show()
Player_Att_numric=Player_Attributes.iloc[:,np.r_[2,3,4,5, 9 :42] ]
merged_plyerName_plyerAtt = players.merge(Player_Att_numric, on="player_api_id",suffixes =("_name","_att"))
spanish_teams_df = Team_df.loc[(Team_df['id'] >= 43035) & (Team_df['id'] <= 48358 )]
check for null values for both DF
spanish_teams_df.isnull().values.any()
False
Team_Attributes_df.isnull().values.any()
False
Team_Attributes_df.duplicated().sum()
0
spanish_teams_df.duplicated().sum()
0
names_of_spanish_teams_df =spanish_teams_df.iloc[: , np.r_ [1,3 ]]
names_of_spanish_teams_df.isnull().values.any()
False
spanish_att_df = names_of_spanish_teams_df.merge(Team_Attributes_df, on="team_api_id",suffixes =("_am","_att"))
spanish_att_df.head(3)
| team_api_id | team_long_name | id | team_fifa_api_id | date | buildUpPlaySpeed | buildUpPlaySpeedClass | buildUpPlayDribblingClass | buildUpPlayPassing | buildUpPlayPassingClass | buildUpPlayPositioningClass | chanceCreationPassing | chanceCreationPassingClass | chanceCreationCrossing | chanceCreationCrossingClass | chanceCreationShooting | chanceCreationShootingClass | chanceCreationPositioningClass | defencePressure | defencePressureClass | defenceAggression | defenceAggressionClass | defenceTeamWidth | defenceTeamWidthClass | defenceDefenderLineClass | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10267 | Valencia CF | 1308 | 461 | 2010-02-22 00:00:00 | 30 | Slow | Little | 30 | Short | Free Form | 55 | Normal | 60 | Normal | 70 | Lots | Organised | 55 | Medium | 60 | Press | 60 | Normal | Offside Trap |
| 1 | 10267 | Valencia CF | 1309 | 461 | 2011-02-22 00:00:00 | 41 | Balanced | Little | 40 | Mixed | Organised | 56 | Normal | 55 | Normal | 64 | Normal | Free Form | 40 | Medium | 48 | Press | 54 | Normal | Cover |
| 2 | 10267 | Valencia CF | 1310 | 461 | 2012-02-22 00:00:00 | 20 | Slow | Little | 35 | Mixed | Organised | 38 | Normal | 56 | Normal | 55 | Normal | Organised | 51 | Medium | 38 | Press | 59 | Normal | Cover |
spanish_att_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 187 entries, 0 to 186 Data columns (total 25 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 team_api_id 187 non-null int64 1 team_long_name 187 non-null object 2 id 187 non-null int64 3 team_fifa_api_id 187 non-null int64 4 date 187 non-null object 5 buildUpPlaySpeed 187 non-null int64 6 buildUpPlaySpeedClass 187 non-null object 7 buildUpPlayDribblingClass 187 non-null object 8 buildUpPlayPassing 187 non-null int64 9 buildUpPlayPassingClass 187 non-null object 10 buildUpPlayPositioningClass 187 non-null object 11 chanceCreationPassing 187 non-null int64 12 chanceCreationPassingClass 187 non-null object 13 chanceCreationCrossing 187 non-null int64 14 chanceCreationCrossingClass 187 non-null object 15 chanceCreationShooting 187 non-null int64 16 chanceCreationShootingClass 187 non-null object 17 chanceCreationPositioningClass 187 non-null object 18 defencePressure 187 non-null int64 19 defencePressureClass 187 non-null object 20 defenceAggression 187 non-null int64 21 defenceAggressionClass 187 non-null object 22 defenceTeamWidth 187 non-null int64 23 defenceTeamWidthClass 187 non-null object 24 defenceDefenderLineClass 187 non-null object dtypes: int64(11), object(14) memory usage: 38.0+ KB
spanish_numric_att_list = ["buildUpPlaySpeed","buildUpPlayPassing","chanceCreationPassing","chanceCreationCrossing","chanceCreationShooting","defencePressure","defenceAggression","defenceTeamWidth"]
spanish_att_df[spanish_numric_att_list].head(2)
| buildUpPlaySpeed | buildUpPlayPassing | chanceCreationPassing | chanceCreationCrossing | chanceCreationShooting | defencePressure | defenceAggression | defenceTeamWidth | |
|---|---|---|---|---|---|---|---|---|
| 0 | 30 | 30 | 55 | 60 | 70 | 55 | 60 | 60 |
| 1 | 41 | 40 | 56 | 55 | 64 | 40 | 48 | 54 |
spanish_att_num_df =spanish_att_df[spanish_numric_att_list]
spanish_att_num_df.describe()
| buildUpPlaySpeed | buildUpPlayPassing | chanceCreationPassing | chanceCreationCrossing | chanceCreationShooting | defencePressure | defenceAggression | defenceTeamWidth | |
|---|---|---|---|---|---|---|---|---|
| count | 187.000000 | 187.000000 | 187.000000 | 187.000000 | 187.000000 | 187.000000 | 187.000000 | 187.000000 |
| mean | 47.368984 | 45.839572 | 53.171123 | 53.716578 | 54.941176 | 47.828877 | 48.566845 | 55.850267 |
| std | 11.169645 | 10.997357 | 10.429622 | 10.835520 | 10.259628 | 9.231935 | 8.681929 | 8.793878 |
| min | 20.000000 | 23.000000 | 30.000000 | 24.000000 | 23.000000 | 30.000000 | 30.000000 | 30.000000 |
| 25% | 40.000000 | 37.000000 | 45.000000 | 45.000000 | 50.000000 | 40.000000 | 41.500000 | 50.000000 |
| 50% | 48.000000 | 44.000000 | 54.000000 | 54.000000 | 54.000000 | 49.000000 | 49.000000 | 56.000000 |
| 75% | 55.000000 | 54.000000 | 59.000000 | 60.000000 | 60.000000 | 52.000000 | 53.000000 | 62.000000 |
| max | 71.000000 | 73.000000 | 73.000000 | 78.000000 | 79.000000 | 70.000000 | 70.000000 | 70.000000 |
plt.figure(figsize=(6,6), dpi= 180)
fig, ax = plt.subplots();
spanish_att_df.chanceCreationPassing.hist(bins=9);
ax.set_title("Spinsh teams chance Creation Passing")
ax.set_ylabel('frequency')
ax.set_xlabel('chance Creation Passing grade');
<Figure size 1080x1080 with 0 Axes>
plt.figure(figsize=(6,6), dpi= 180)
fig, ax = plt.subplots();
spanish_att_df.chanceCreationShooting.hist(bins=5);
ax.set_title("Spinsh teams chance Creation Shooting")
ax.set_ylabel('frequency')
ax.set_xlabel('chance Creation Shooting grade');
<Figure size 1080x1080 with 0 Axes>
spanish_att_df.defenceTeamWidth
0 60
1 54
2 59
3 59
4 59
..
182 67
183 57
184 57
185 57
186 57
Name: defenceTeamWidth, Length: 187, dtype: int64
plt.figure(figsize=(6,6), dpi= 180)
fig, ax = plt.subplots();
spanish_att_df.defencePressure.hist(bins=9);
ax.set_title("Spinsh teams defence Pressure")
ax.set_ylabel('frequency')
ax.set_xlabel('defence Pressure grade');
<Figure size 1080x1080 with 0 Axes>
plt.figure(figsize=(6,6), dpi= 180)
fig, ax = plt.subplots();
spanish_att_df.defenceAggression.hist(bins=9);
ax.set_title("Spinsh teams defence Aggression")
ax.set_ylabel('frequency')
ax.set_xlabel('defence Aggressio grade');
<Figure size 1080x1080 with 0 Axes>
plt.figure(figsize=(6,6), dpi= 180)
fig, ax = plt.subplots();
spanish_att_df.defenceAggression.hist(bins=9);
ax.set_title("Spinsh teams defence Aggression")
ax.set_ylabel('frequency')
ax.set_xlabel('defence Aggression grade');
<Figure size 1080x1080 with 0 Axes>
total_df=spanish_att_df.groupby("team_long_name")[ spanish_numric_att_list].sum()
total_df.head(1)
| buildUpPlaySpeed | buildUpPlayPassing | chanceCreationPassing | chanceCreationCrossing | chanceCreationShooting | defencePressure | defenceAggression | defenceTeamWidth | |
|---|---|---|---|---|---|---|---|---|
| team_long_name | ||||||||
| Athletic Club de Bilbao | 310 | 288 | 280 | 421 | 281 | 238 | 374 | 341 |
total_df["sum"] =total_df.sum(axis=1)
total_df["sum"]
#total_df["sum"].sort_values(ascending= False)
team_long_name Athletic Club de Bilbao 2533 Atlético Madrid 2624 CA Osasuna 2560 CD Numancia 2403 CD Tenerife 1672 Córdoba CF 2174 Elche CF 2485 FC Barcelona 2320 Getafe CF 2189 Granada CF 2221 Hércules Club de Fútbol 2100 Levante UD 2515 Málaga CF 2499 RC Celta de Vigo 2479 RC Deportivo de La Coruña 2392 RC Recreativo 2190 RCD Espanyol 2422 RCD Mallorca 2513 Racing Santander 2168 Rayo Vallecano 2356 Real Betis Balompié 2464 Real Madrid CF 2684 Real Sociedad 2398 Real Sporting de Gijón 2591 Real Valladolid 2494 Real Zaragoza 2240 SD Eibar 857 Sevilla FC 2646 UD Almería 2505 UD Las Palmas 2364 Valencia CF 2247 Villarreal CF 2295 Xerez Club Deportivo 1562 Name: sum, dtype: int64
TOp_TEN =total_df["sum"].sort_values(ascending= False).head(10)
TOp_TEN
team_long_name Real Madrid CF 2684 Sevilla FC 2646 Atlético Madrid 2624 Real Sporting de Gijón 2591 CA Osasuna 2560 Athletic Club de Bilbao 2533 Levante UD 2515 RCD Mallorca 2513 UD Almería 2505 Málaga CF 2499 Name: sum, dtype: int64
plt.rcParams.update({'font.size': 19})
plt.figure(figsize=(15,12), dpi= 180)
TOp_TEN .plot(kind='bar', title='Spanish Top ten in total attributes 2008:2016', ylabel='total greads over the period',
xlabel='teams names', figsize=(6, 5));
total_df["sum"].sort_values(ascending= False).head(21)
team_long_name Real Madrid CF 2684 Sevilla FC 2646 Atlético Madrid 2624 Real Sporting de Gijón 2591 CA Osasuna 2560 Athletic Club de Bilbao 2533 Levante UD 2515 RCD Mallorca 2513 UD Almería 2505 Málaga CF 2499 Real Valladolid 2494 Elche CF 2485 RC Celta de Vigo 2479 Real Betis Balompié 2464 RCD Espanyol 2422 CD Numancia 2403 Real Sociedad 2398 RC Deportivo de La Coruña 2392 UD Las Palmas 2364 Rayo Vallecano 2356 FC Barcelona 2320 Name: sum, dtype: int64
spanish_numric_att_limit_list = ["buildUpPlaySpeed","buildUpPlayPassing","chanceCreationPassing","chanceCreationShooting"]
limted_total_df=spanish_att_df.groupby("team_long_name")[ spanish_numric_att_limit_list].sum()
limted_total_df
| buildUpPlaySpeed | buildUpPlayPassing | chanceCreationPassing | chanceCreationShooting | |
|---|---|---|---|---|
| team_long_name | ||||
| Athletic Club de Bilbao | 310 | 288 | 280 | 281 |
| Atlético Madrid | 319 | 287 | 388 | 360 |
| CA Osasuna | 238 | 297 | 321 | 339 |
| CD Numancia | 302 | 273 | 304 | 336 |
| CD Tenerife | 198 | 195 | 225 | 233 |
| Córdoba CF | 242 | 278 | 250 | 307 |
| Elche CF | 289 | 332 | 309 | 344 |
| FC Barcelona | 215 | 204 | 271 | 318 |
| Getafe CF | 207 | 238 | 302 | 332 |
| Granada CF | 262 | 222 | 326 | 262 |
| Hércules Club de Fútbol | 253 | 247 | 283 | 309 |
| Levante UD | 327 | 340 | 350 | 349 |
| Málaga CF | 291 | 253 | 348 | 331 |
| RC Celta de Vigo | 292 | 298 | 316 | 336 |
| RC Deportivo de La Coruña | 307 | 257 | 239 | 385 |
| RC Recreativo | 255 | 303 | 268 | 220 |
| RCD Espanyol | 319 | 271 | 350 | 284 |
| RCD Mallorca | 337 | 320 | 317 | 315 |
| Racing Santander | 236 | 246 | 318 | 257 |
| Rayo Vallecano | 253 | 261 | 255 | 319 |
| Real Betis Balompié | 314 | 244 | 334 | 339 |
| Real Madrid CF | 304 | 232 | 405 | 414 |
| Real Sociedad | 241 | 231 | 338 | 359 |
| Real Sporting de Gijón | 354 | 287 | 398 | 305 |
| Real Valladolid | 317 | 322 | 302 | 302 |
| Real Zaragoza | 235 | 201 | 307 | 340 |
| SD Eibar | 124 | 104 | 119 | 116 |
| Sevilla FC | 342 | 336 | 350 | 315 |
| UD Almería | 300 | 363 | 330 | 299 |
| UD Las Palmas | 276 | 262 | 324 | 341 |
| Valencia CF | 176 | 210 | 262 | 357 |
| Villarreal CF | 233 | 194 | 273 | 369 |
| Xerez Club Deportivo | 190 | 176 | 181 | 201 |
limted_total_df["sum"] =limted_total_df.sum(axis=1)
limted_total_df["sum"].sort_values(ascending= False)
team_long_name Levante UD 1366 Real Madrid CF 1355 Atlético Madrid 1354 Real Sporting de Gijón 1344 Sevilla FC 1343 UD Almería 1292 RCD Mallorca 1289 Elche CF 1274 Real Valladolid 1243 RC Celta de Vigo 1242 Real Betis Balompié 1231 RCD Espanyol 1224 Málaga CF 1223 CD Numancia 1215 UD Las Palmas 1203 CA Osasuna 1195 RC Deportivo de La Coruña 1188 Real Sociedad 1169 Athletic Club de Bilbao 1159 Hércules Club de Fútbol 1092 Rayo Vallecano 1088 Real Zaragoza 1083 Getafe CF 1079 Córdoba CF 1077 Granada CF 1072 Villarreal CF 1069 Racing Santander 1057 RC Recreativo 1046 FC Barcelona 1008 Valencia CF 1005 CD Tenerife 851 Xerez Club Deportivo 748 SD Eibar 463 Name: sum, dtype: int64
TOp_limt_TEN =limted_total_df["sum"].sort_values(ascending= False).head(10)
TOp_limt_TEN
team_long_name Levante UD 1366 Real Madrid CF 1355 Atlético Madrid 1354 Real Sporting de Gijón 1344 Sevilla FC 1343 UD Almería 1292 RCD Mallorca 1289 Elche CF 1274 Real Valladolid 1243 RC Celta de Vigo 1242 Name: sum, dtype: int64
plt.rcParams.update({'font.size': 19})
plt.figure(figsize=(15,12), dpi= 180)
TOp_limt_TEN.plot(kind='barh', title='Spanish Top ten Limited List in total attributes 2008:2016', ylabel='total greads over the period',
xlabel='teams names', figsize=(6, 5));
limted_total_df["sum"].sort_values(ascending= False).head(29)
team_long_name Levante UD 1366 Real Madrid CF 1355 Atlético Madrid 1354 Real Sporting de Gijón 1344 Sevilla FC 1343 UD Almería 1292 RCD Mallorca 1289 Elche CF 1274 Real Valladolid 1243 RC Celta de Vigo 1242 Real Betis Balompié 1231 RCD Espanyol 1224 Málaga CF 1223 CD Numancia 1215 UD Las Palmas 1203 CA Osasuna 1195 RC Deportivo de La Coruña 1188 Real Sociedad 1169 Athletic Club de Bilbao 1159 Hércules Club de Fútbol 1092 Rayo Vallecano 1088 Real Zaragoza 1083 Getafe CF 1079 Córdoba CF 1077 Granada CF 1072 Villarreal CF 1069 Racing Santander 1057 RC Recreativo 1046 FC Barcelona 1008 Name: sum, dtype: int64
limted_total_df.describe()
| buildUpPlaySpeed | buildUpPlayPassing | chanceCreationPassing | chanceCreationShooting | sum | |
|---|---|---|---|---|---|
| count | 33.000000 | 33.000000 | 33.000000 | 33.000000 | 33.000000 |
| mean | 268.424242 | 259.757576 | 301.303030 | 311.333333 | 1140.818182 |
| std | 53.680205 | 54.845254 | 58.592067 | 57.570971 | 187.168449 |
| min | 124.000000 | 104.000000 | 119.000000 | 116.000000 | 463.000000 |
| 25% | 236.000000 | 231.000000 | 271.000000 | 299.000000 | 1072.000000 |
| 50% | 276.000000 | 261.000000 | 309.000000 | 319.000000 | 1188.000000 |
| 75% | 310.000000 | 297.000000 | 334.000000 | 341.000000 | 1243.000000 |
| max | 354.000000 | 363.000000 | 405.000000 | 414.000000 | 1366.000000 |
buildUpPlaySpeed mean = 268 < median = 276
buildUpPlayPassing mean = 260 < median = 261
chanceCreationPassing mean = 301 < median = 309
chanceCreationShooting mean = 311 < median = 319
limted_total_df["sum"] =limted_total_df.sum(axis=1)
limted_total_sum_df=limted_total_df.iloc[ :,:].sort_values(by="sum",ascending= False)
limted_total_sum_df.head(29).style.bar()
| buildUpPlaySpeed | buildUpPlayPassing | chanceCreationPassing | chanceCreationShooting | sum | |
|---|---|---|---|---|---|
| team_long_name | |||||
| Levante UD | 327 | 340 | 350 | 349 | 2732 |
| Real Madrid CF | 304 | 232 | 405 | 414 | 2710 |
| Atlético Madrid | 319 | 287 | 388 | 360 | 2708 |
| Real Sporting de Gijón | 354 | 287 | 398 | 305 | 2688 |
| Sevilla FC | 342 | 336 | 350 | 315 | 2686 |
| UD Almería | 300 | 363 | 330 | 299 | 2584 |
| RCD Mallorca | 337 | 320 | 317 | 315 | 2578 |
| Elche CF | 289 | 332 | 309 | 344 | 2548 |
| Real Valladolid | 317 | 322 | 302 | 302 | 2486 |
| RC Celta de Vigo | 292 | 298 | 316 | 336 | 2484 |
| Real Betis Balompié | 314 | 244 | 334 | 339 | 2462 |
| RCD Espanyol | 319 | 271 | 350 | 284 | 2448 |
| Málaga CF | 291 | 253 | 348 | 331 | 2446 |
| CD Numancia | 302 | 273 | 304 | 336 | 2430 |
| UD Las Palmas | 276 | 262 | 324 | 341 | 2406 |
| CA Osasuna | 238 | 297 | 321 | 339 | 2390 |
| RC Deportivo de La Coruña | 307 | 257 | 239 | 385 | 2376 |
| Real Sociedad | 241 | 231 | 338 | 359 | 2338 |
| Athletic Club de Bilbao | 310 | 288 | 280 | 281 | 2318 |
| Hércules Club de Fútbol | 253 | 247 | 283 | 309 | 2184 |
| Rayo Vallecano | 253 | 261 | 255 | 319 | 2176 |
| Real Zaragoza | 235 | 201 | 307 | 340 | 2166 |
| Getafe CF | 207 | 238 | 302 | 332 | 2158 |
| Córdoba CF | 242 | 278 | 250 | 307 | 2154 |
| Granada CF | 262 | 222 | 326 | 262 | 2144 |
| Villarreal CF | 233 | 194 | 273 | 369 | 2138 |
| Racing Santander | 236 | 246 | 318 | 257 | 2114 |
| RC Recreativo | 255 | 303 | 268 | 220 | 2092 |
| FC Barcelona | 215 | 204 | 271 | 318 | 2016 |
match_df=match_df.iloc[: , np.r_ [2:11 ]]
spanish_match_df = match_df[match_df["league_id"]==21518]
spanish_match_df.head(1)
| league_id | season | stage | date | match_api_id | home_team_api_id | away_team_api_id | home_team_goal | away_team_goal | |
|---|---|---|---|---|---|---|---|---|---|
| 21517 | 21518 | 2008/2009 | 1 | 2008-08-30 00:00:00 | 530023 | 10267 | 8661 | 3 | 0 |
spanish_match_df.shape
(3040, 9)
spanish_match_df.isnull().values.any()
False
spanish_match_df.duplicated().sum()
0
home_names_of_spanish_teams_df=names_of_spanish_teams_df.copy()
home_names_of_spanish_teams_df.rename(columns={'team_api_id': 'home_team_api_id'}, inplace=True)
home_names_of_spanish_teams_df.head(1)
| home_team_api_id | team_long_name | |
|---|---|---|
| 251 | 10267 | Valencia CF |
spanish_match_df.loc[:-2,"home_point"] = 0
spanish_match_df["home_point"]= np.select(condlist=[spanish_match_df["home_team_goal"] > spanish_match_df["away_team_goal"],
spanish_match_df["home_team_goal"] == spanish_match_df["away_team_goal"]],
choicelist=[3,1],default=0)
spanish_match_df
| league_id | season | stage | date | match_api_id | home_team_api_id | away_team_api_id | home_team_goal | away_team_goal | home_point | |
|---|---|---|---|---|---|---|---|---|---|---|
| 21517 | 21518 | 2008/2009 | 1 | 2008-08-30 00:00:00 | 530023 | 10267 | 8661 | 3 | 0 | 3 |
| 21518 | 21518 | 2008/2009 | 1 | 2008-08-31 00:00:00 | 530084 | 8371 | 10205 | 1 | 1 | 1 |
| 21519 | 21518 | 2008/2009 | 1 | 2008-08-31 00:00:00 | 530085 | 9783 | 8633 | 2 | 1 | 3 |
| 21520 | 21518 | 2008/2009 | 1 | 2008-08-31 00:00:00 | 530086 | 8388 | 8634 | 1 | 0 | 3 |
| 21521 | 21518 | 2008/2009 | 1 | 2008-08-31 00:00:00 | 530087 | 8696 | 8302 | 1 | 1 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 24552 | 21518 | 2015/2016 | 9 | 2015-10-25 00:00:00 | 2030167 | 9906 | 10267 | 2 | 1 | 3 |
| 24553 | 21518 | 2015/2016 | 9 | 2015-10-24 00:00:00 | 2030168 | 9864 | 9783 | 2 | 0 | 3 |
| 24554 | 21518 | 2015/2016 | 9 | 2015-10-26 00:00:00 | 2030169 | 8315 | 9869 | 3 | 0 | 3 |
| 24555 | 21518 | 2015/2016 | 9 | 2015-10-24 00:00:00 | 2030170 | 7878 | 8603 | 1 | 1 | 1 |
| 24556 | 21518 | 2015/2016 | 9 | 2015-10-23 00:00:00 | 2030171 | 8370 | 8558 | 3 | 0 | 3 |
3040 rows × 10 columns
home_spanish_match_df = spanish_match_df.iloc[: , np.r_ [1,3,5,7,9 ]]
home_spanish_match_df.head(1)
| season | date | home_team_api_id | home_team_goal | home_point | |
|---|---|---|---|---|---|
| 21517 | 2008/2009 | 2008-08-30 00:00:00 | 10267 | 3 | 3 |
result_home_spanish_df =home_names_of_spanish_teams_df.merge(home_spanish_match_df, on="home_team_api_id")
# parssing the date
result_home_spanish_df["date"]= pd.to_datetime(result_home_spanish_df["date"])
spanish_match_df.head(2)
| league_id | season | stage | date | match_api_id | home_team_api_id | away_team_api_id | home_team_goal | away_team_goal | home_point | |
|---|---|---|---|---|---|---|---|---|---|---|
| 21517 | 21518 | 2008/2009 | 1 | 2008-08-30 00:00:00 | 530023 | 10267 | 8661 | 3 | 0 | 3 |
| 21518 | 21518 | 2008/2009 | 1 | 2008-08-31 00:00:00 | 530084 | 8371 | 10205 | 1 | 1 | 1 |
goals_per_season =spanish_match_df.groupby("season")[ ["home_team_goal","away_team_goal"]].sum()
goals_per_season
| home_team_goal | away_team_goal | |
|---|---|---|
| season | ||
| 2008/2009 | 631 | 470 |
| 2009/2010 | 608 | 423 |
| 2010/2011 | 622 | 420 |
| 2011/2012 | 638 | 412 |
| 2012/2013 | 641 | 450 |
| 2013/2014 | 620 | 425 |
| 2014/2015 | 584 | 425 |
| 2015/2016 | 615 | 428 |
fig, ax = plt.subplots(figsize=(15,6), dpi= 180)
goals_per_season["home_team_goal"].plot(ax=ax)
goals_per_season["away_team_goal"].plot(ax=ax)
plt.title("The spanish home vs. away goals 2008:2016")
plt.xlabel("Season")
plt.ylabel("POINTS")
plt.show()
result_home_spanish_df.head(2)
| home_team_api_id | team_long_name | season | date | home_team_goal | home_point | |
|---|---|---|---|---|---|---|
| 0 | 10267 | Valencia CF | 2008/2009 | 2008-08-30 | 3 | 3 |
| 1 | 10267 | Valencia CF | 2008/2009 | 2008-11-15 | 2 | 0 |
goals_home_per_teams =result_home_spanish_df.groupby("team_long_name")["home_team_goal"].sum().sort_values(ascending= False).head(10)
goals_home_per_teams
team_long_name Real Madrid CF 505 FC Barcelona 495 Atlético Madrid 321 Valencia CF 299 Sevilla FC 285 Athletic Club de Bilbao 250 Málaga CF 222 Villarreal CF 218 RCD Espanyol 205 Getafe CF 202 Name: home_team_goal, dtype: int64
goals_home_per_teams.describe()
count 10.000000 mean 300.200000 std 112.868458 min 202.000000 25% 219.000000 50% 267.500000 75% 315.500000 max 505.000000 Name: home_team_goal, dtype: float64
title_s ='Spanish Top ten as home goals 2008:2016'
ylab = 'total goals over the period'
goals_home_per_teams.plot(kind='bar', title =title_s, ylabel= ylab,xlabel='Home Team', figsize=(6, 5));
#top ten team home points
Hr = result_home_spanish_df.groupby(["team_long_name"])["home_point"].sum().sort_values(ascending= False )#.unstack().plot(kind="bar")
Hr.iloc[:11]
team_long_name FC Barcelona 405 Real Madrid CF 398 Atlético Madrid 333 Valencia CF 304 Sevilla FC 300 Athletic Club de Bilbao 271 Villarreal CF 261 Málaga CF 241 RCD Espanyol 241 Getafe CF 225 Real Sociedad 200 Name: home_point, dtype: int32
Hr.describe()
count 33.000000 mean 156.333333 std 112.506574 min 9.000000 25% 47.000000 50% 133.000000 75% 241.000000 max 405.000000 Name: home_point, dtype: float64
from describe() we notice ateam have only 9 points at home in 8 sessions
Hr_di = result_home_spanish_df.groupby(["team_long_name"])["home_point"].sum().sort_values(ascending= False )
Hr_di.iloc[-1:]
team_long_name Córdoba CF 9 Name: home_point, dtype: int32
Hr_di=Hr_di.iloc[:11].sort_values(ascending= False );
title_s ='Spanish Top ten as home points 2008:2016'
ylab = 'total points over the period'
Hr_di.plot(kind='bar', title =title_s, ylabel= ylab,xlabel='Home Team', figsize=(6, 5))
<AxesSubplot:title={'center':'Spanish Top ten as home points 2008:2016'}, xlabel='Home Team', ylabel='total points over the period'>
away_names_of_spanish_teams_df =names_of_spanish_teams_df.copy()
away_names_of_spanish_teams_df.rename(columns={'team_api_id': 'away_team_api_id'}, inplace=True)
away_names_of_spanish_teams_df.head(1)
| away_team_api_id | team_long_name | |
|---|---|---|
| 251 | 10267 | Valencia CF |
spanish_match_df.loc[:-2,"away_point"] = 0
spanish_match_df["away_point"]= np.select(condlist=[spanish_match_df["home_team_goal"] < spanish_match_df["away_team_goal"],
spanish_match_df["home_team_goal"] == spanish_match_df["away_team_goal"]],choicelist=[3,1],default=0)
away_spanish_match_df = spanish_match_df.iloc[: , np.r_ [1,3,6,8,10 ]]
away_spanish_match_df.head(1)
| season | date | away_team_api_id | away_team_goal | away_point | |
|---|---|---|---|---|---|
| 21517 | 2008/2009 | 2008-08-30 00:00:00 | 8661 | 0 | 0 |
result_away_spanish_df = away_names_of_spanish_teams_df.merge(away_spanish_match_df, on="away_team_api_id")
# parssing the date
result_away_spanish_df["date"]= pd.to_datetime(result_away_spanish_df["date"])
Ar=result_away_spanish_df.groupby(["team_long_name"] )["away_point"].sum().sort_values(ascending= False)
Ar =Ar.iloc[:11].sort_values(ascending= False );
title_s ='Spanish Top ten as away points 2008:2016'
ylab = 'total points over the period'
Ar.plot(kind='bar', title =title_s, ylabel= ylab,xlabel='away Team', figsize=(6, 5))
<AxesSubplot:title={'center':'Spanish Top ten as away points 2008:2016'}, xlabel='away Team', ylabel='total points over the period'>
goals_away_per_teams=result_away_spanish_df.groupby(["team_long_name"] )["away_team_goal"].sum().sort_values(ascending= False)
goals_away_per_teams= goals_away_per_teams.iloc[:11].sort_values(ascending= False );
title_s ='Spanish Top ten as away goals 2008:2016'
ylab = 'total goals over the period'
goals_away_per_teams.plot(kind='bar', title =title_s, ylabel= ylab,xlabel='away Team', figsize=(6, 5));
Ar.describe()
count 11.000000 mean 190.545455 std 77.266246 min 111.000000 25% 137.000000 50% 166.000000 75% 211.500000 max 340.000000 Name: away_point, dtype: float64
goals_away_per_teams.describe()
count 11.000000 mean 195.000000 std 78.467828 min 130.000000 25% 144.500000 50% 165.000000 75% 198.000000 max 354.000000 Name: away_team_goal, dtype: float64
Ar=result_away_spanish_df.groupby(["team_long_name"] )["away_point"].sum().sort_values(ascending= True)
Ar.iloc[:1]
team_long_name CD Numancia 5 Name: away_point, dtype: int32
top_three = ["Atlético Madrid","Real Madrid CF", "FC Barcelona"]
# select only the 3 teams results from home goal & point
home_top_3 =result_home_spanish_df.query(f'team_long_name in {top_three}')
home_top_3.head(1)
| home_team_api_id | team_long_name | season | date | home_team_goal | home_point | |
|---|---|---|---|---|---|---|
| 608 | 8633 | Real Madrid CF | 2008/2009 | 2008-11-08 | 4 | 3 |
Hr_home_top_3 = home_top_3.groupby(["season","team_long_name"])["home_point"].sum()
Hr_home_top_3
season team_long_name
2008/2009 Atlético Madrid 40
FC Barcelona 45
Real Madrid CF 44
2009/2010 Atlético Madrid 36
FC Barcelona 55
Real Madrid CF 54
2010/2011 Atlético Madrid 33
FC Barcelona 50
Real Madrid CF 49
2011/2012 Atlético Madrid 38
FC Barcelona 52
Real Madrid CF 50
2012/2013 Atlético Madrid 44
FC Barcelona 55
Real Madrid CF 53
2013/2014 Atlético Madrid 49
FC Barcelona 50
Real Madrid CF 49
2014/2015 Atlético Madrid 45
FC Barcelona 49
Real Madrid CF 50
2015/2016 Atlético Madrid 48
FC Barcelona 49
Real Madrid CF 49
Name: home_point, dtype: int32
top_three = ["Atlético Madrid","Real Madrid CF", "FC Barcelona"]
away_top_3 =result_away_spanish_df.query(f'team_long_name in {top_three}')
away_top_3.head(1)
| away_team_api_id | team_long_name | season | date | away_team_goal | away_point | |
|---|---|---|---|---|---|---|
| 608 | 8633 | Real Madrid CF | 2008/2009 | 2008-08-31 | 1 | 0 |
Ar_away_top_3 = away_top_3.groupby(["season","team_long_name"])["away_point"].sum()
Ar_away_top_3
season team_long_name
2008/2009 Atlético Madrid 27
FC Barcelona 42
Real Madrid CF 34
2009/2010 Atlético Madrid 11
FC Barcelona 44
Real Madrid CF 42
2010/2011 Atlético Madrid 25
FC Barcelona 46
Real Madrid CF 43
2011/2012 Atlético Madrid 18
FC Barcelona 39
Real Madrid CF 50
2012/2013 Atlético Madrid 32
FC Barcelona 45
Real Madrid CF 32
2013/2014 Atlético Madrid 41
FC Barcelona 37
Real Madrid CF 38
2014/2015 Atlético Madrid 33
FC Barcelona 45
Real Madrid CF 42
2015/2016 Atlético Madrid 40
FC Barcelona 42
Real Madrid CF 41
Name: away_point, dtype: int32
total_home_top_3 = home_top_3.iloc[: , np.r_ [1,2,3,5 ]]
total_home_top_3_1=total_home_top_3.rename({'home_point': 'points'}, axis=1)
total_away_top_3 =away_top_3.iloc[: , np.r_ [1,2,3,5 ]]
total_away_top_3_2 =total_away_top_3.rename({'away_point': 'points'}, axis=1)
# concat the two df hom and away for top three
total_top_3_all =pd.concat([total_home_top_3_1, total_away_top_3_2], axis=0)
total_top_3_all
| team_long_name | season | date | points | |
|---|---|---|---|---|
| 608 | Real Madrid CF | 2008/2009 | 2008-11-08 | 3 |
| 609 | Real Madrid CF | 2008/2009 | 2008-11-22 | 3 |
| 610 | Real Madrid CF | 2008/2009 | 2008-12-07 | 0 |
| 611 | Real Madrid CF | 2008/2009 | 2008-12-20 | 3 |
| 612 | Real Madrid CF | 2008/2009 | 2009-01-04 | 3 |
| ... | ... | ... | ... | ... |
| 2142 | Atlético Madrid | 2015/2016 | 2016-04-20 | 3 |
| 2143 | Atlético Madrid | 2015/2016 | 2016-05-08 | 0 |
| 2144 | Atlético Madrid | 2015/2016 | 2015-09-19 | 3 |
| 2145 | Atlético Madrid | 2015/2016 | 2015-09-26 | 0 |
| 2146 | Atlético Madrid | 2015/2016 | 2015-10-18 | 3 |
912 rows × 4 columns
fig, ax = plt.subplots(figsize=(14,5), dpi= 180)
total_top_3_all.groupby(["season" ,"team_long_name" ])[ "points"].sum().unstack().plot(ax=ax)
plt.title("The spanish top tree teams 2008:2016")
plt.xlabel("Seasons")
plt.ylabel("POINTS")
plt.show()
plt.tight_layout();
<Figure size 432x288 with 0 Axes>
remarkable journey from the 9th in 2009 to the champion in 2013 Atlético Madrid was the champion la Liga 2013/2014 Atlético Madrid 90 FC Barcelona 87 Real Madrid CF 87 Atlético repeated his achievement again and won the championship but after 2016 which out analysis scope
ttt_groubed=total_top_3_all.groupby(["team_long_name","season"])["points"].sum();
plt.rcParams.update({'font.size': 19})
ax =ttt_groubed.unstack().plot(kind ="barh" , figsize=(14,15));
ax.legend(bbox_to_anchor=(1,1))
plt.title("The spanish top tree teams 2008:2016")
plt.ylabel("teams")
plt.xlabel("POINTS")
plt.show()
plt.tight_layout();
<Figure size 432x288 with 0 Axes>
www.geeksforgeeks.org www.stackoverflow.com www.analyticsvidhya.com